Engine Repair at 60 MPH

2016/11/20

You can't stop the car to do repairs, so...

As I mentioned in prior posts, complex systems are characterized by their dependencies - both internal and external.

In practical terms, this means that changes to one part of a complex system have the potential for unforeseen consequences in other parts of the system.

This leads to two common approaches to system maintenance. Sadly, both of these approaches ultimately create unacceptable problems:

1. Additive Change

In this approach, developers and managers continuously make the decision that it is safer to "patch", "tweak", or "adjust" a critical part of the system, rather than refactor or replace it.

This "safe" approach involves adding yet another layer to the system or component, and generally creates more complexity and more dependencies. This reduces the maintainability of the system, while increasing technical debt.

Ironically, "patching" tends to be applied to the most actively-changing parts of the system. This results in the greatest accumulation of technical-debt in precisely the areas that need greatest maintenance velocity!

2. Monolithic Development and Deployment

In this approach, developers and managers decide that the only "safe" approach is to treat the entire system as a single unit (since there are so many dependencies).

This perspective leads to laziness with respect to OO (object oriented) design and practices, and is characterized by the following symptoms:

This approach severely reduces the velocity of development and deployment, while increasing technical debt.

A Better Approach

In order to reduce complexity, it is critical to eliminate dependencies where possible. This is the key to creating components that maximize maintainability, minimize team size, maximize independence from other components - including an independent life cycle - ultimately resulting in parallel development, maintenance and deployment.

However, an important consideration is that we need to do this while reducing or eliminating disruption to ongoing operations.

In other words, we want to do engine repair on our car while racing down the highway at 60 mph!

There is an incremental step-by-step way to accomplish this (without crashing).

By The Numbers

The following steps allow us to reduce the complexity of our systems, in a controlled and safe way:

1. Discover

Identify parts of the system that have the potential for independence. Examples include Database, User Interfaces (front-end), Business Logic, External Interfaces, Analytics, Transaction Processing, Data Transformation, etc.

Currently, there may be large numbers of dependencies between these parts of your system. Our goal in this step is just to identify the ideal components (concerns). We will address the dependencies later.

2. Decide

With the key components of the system identified, a decision is made for each component: whether or not to create a new (virtually empty) component, or use the existing component. This decision depends upon how much technical debt a given existing component has. Components that are extremely debt-ridden should not be (flagged to be) kept.

The key point here is that highly debt-ridden components cannot easily be refactored without risk to ongoing operations. In these cases, it is better to make a new component that will exist (for a time) in parallel with the old component. These new components should start life virtually empty - not being used by the system (even after initial deployment to production).

3. Connect

At this point, all components will have been identified. Some of these components will already be in use by the system (because they existed previously), and some components will be new - and not yet being used.

Next, design and implement clean interfaces between your components. These will become the "contracts" used by developers going forward, allowing components (and their associated teams) to work independently and in parallel (to the interface).

A key point here is that these interfaces should be minimal - as thin as you can possibly make them.

Any newly implemented interfaces will not need to be used until later.

4. Change

With the components and interfaces identified and/or implemented, there will now be a framework for incremental, and non-disruptive change.

The main point is that there should now be well-understood interfaces and components that are ready to receive legacy code and iterative changes that will reduce dependencies and the resulting complexity and technical debt.

Next, incrementally make the following types of changes:

An important point is that these steps can be done incrementally, and deployed to the production environment—even if the new code or new functionality is not being used. This is important in order to avoid massive testing and massive switch-overs.

Later, relatively small subsets of system functionality can be directed to the incrementally forming new infrastructure. This allows confidence in the new components and structure to grow gradually and in a controlled, non-disruptive fashion.

Another important point is that this process allows teams and systems to gradually become adapted to a more incremental and parallel mode of development and deployment.

5. Use

With a growing new infrastructure (existing in parallel with the legacy structures), it will soon become possible to redirect a subset of functionality to new or improved components.

This may take the form of a single customer/client (who may have requested new functionality), or it may take the form of a subset of system functionality, such as backup, transaction processing, table maintenance, an external interface, etc.

The key point is that system functionality can be switched over gradually, in a controlled manner, and with the ability to immediately switch back if there are any issues.

This mitigates any risk associated with change, opening the door to more confidence in accommodating change, and allowing teams and managers to become less conservative about change - and ultimately making "patching" largely a thing of the past - only used as a last resort - and only temporarily.

6. Retire

Over time, parts of the existing (legacy) structures and functionality will become unused. These parts of the system may be left in place as long as is needed to cement confidence in new components and functionality.

Eventually, these (debt-ridden and now obsolete) parts of the system can be retired (removed).

Next Time

In my next post, I will present another powerful approach for reducing complexity that typically allows a significant number of objects to be removed from the system, while decreasing maintenance and subsequently improving time-to-market for new features and functionality.